Machine Learning Bookcamp: Build a portfolio of real-life projects by Alexey Grigorev

Machine Learning Bookcamp: Build a portfolio of real-life projects by Alexey Grigorev

Author:Alexey Grigorev [Grigorev, Alexey]
Language: eng
Format: epub, mobi
Publisher: Manning Publications Co.
Published: 2021-10-19T22:00:00+00:00


Figure 6.9 A tree with more levels can learn more complex rules. A tree with two levels is less complex than a tree with three levels and, thus, less prone to overfitting.

The default value for the max_depth parameter is None, which means that the tree can grow as large as possible. We can try a smaller value and compare the results.

For example, we can change it to 2:

dt = DecisionTreeClassifier(max_depth=2) dt.fit(X_train, y_train)

To visualize the tree we just learned, we can use the export_text function from the tree package:

from sklearn.tree import export_text tree_text = export_text(dt, feature_names=dv.feature_names_) print(tree_text)

We only need to specify the names of features using the feature_names parameter. We can get it from the DictVectorizer. When we print it, we get the following:

|--- records=no <= 0.50 | |--- seniority <= 6.50 | | |--- class: True | |--- seniority > 6.50 | | |--- class: False |--- records=no > 0.50 | |--- job=parttime <= 0.50 | | |--- class: False | |--- job=parttime > 0.50 | | |--- class: True

Each line in the output corresponds to a node with a condition. If the condition is true, we go inside and repeat the process until we arrive at the final decision. At the end, if class is True, then the decision is “default,” and otherwise it’s “OK.”

The condition records=no > 0.50 means that a customer has no records. Recall that we use one-hot encoding to represent records with two features: records=yes and records=no. For a customer with no records, records=no is set to “1” and records=yes to “0.” Thus, “records=no > 0.50 is true when the value for records is no (figure 6.10).



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.